-
Notifications
You must be signed in to change notification settings - Fork 421
pyconfig → pydantic #1836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
pyconfig → pydantic #1836
Conversation
richjames0
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending line lengths which make it hard to read
bvandermoon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do the file name letters stand for? types_j, types_g, etc.
@bvandermoon Oh ignore that. Each is a different attempt (in lexicographical order). All will be removed with a singular |
8f66c7a to
b2b4bf7
Compare
b2b4bf7 to
70ed270
Compare
6747e19 to
35662c7
Compare
…configuration files in MaxText ; [src/MaxText/pyconfig.py] New temporary wrapper to not break existing API ; [src/MaxText/pyconfig_og.py] Move original version here ; [src/MaxText/configs/__init__.py] Make this a module ; [tests/pyconfig_test.py] Import from og pyconfig ; [*requirements*.txt] Add pydantic requirement ; [tests/configs_test.py] Test every config in the repo ; [tests/configs_value_test.py] Test various config values
35662c7 to
42c3719
Compare
Description
Background
MaxText is currently engineered around pyconfig. Pyconfig—https://pypi.org/project/pyconfig/—was last updated in 2017 and has 20k monthly downloads.
Pydantic—https://pypi.org/project/pydantic/—is constantly updated and has hundreds of millions of monthly downloads.
– https://docs.pydantic.dev
Summary
The TL;DR version is:
--helpand GUIs and generate SQL models (SQLAlchemy) as desiredThis introduces
487newpydantic.fields.Fields across 78classes (65 inheriting frompydantic.BaseModel) to replace the old untyped undocumented system.Migration
Proposed changes to the MaxText codebase:
from_pyconfig_to_pydantictypes.py, ortypes.pyper config occurrence (e.g., one per module if each module has a different config)Tests
CI and manual:
TL;DR version, these worked locally:
defaultmistral-7bdeepseek3-tinygemma-2bgemma2-2bqwen3-0.6bqwen3-4bqwen3-4b-thinking-2507gpt3-6bgpt3-52kAnd these worked via xpk on the cluster:
default_basic_1default_32default_64default_128default_256default_512gpt_3_175bgpt_3_175b_bf16llama2_7b_4096llama2_70b_4096llama2_70b_4096_syntheticllama2_70b_4096_scllama2_70b_4096_sc_real_data_tfdsllama2_70b_4096_sc_real_data_grainllama2_70b_4096_sc_real_data_grain_checkpointllama2_70b_4096_rd_lrllama3_8b_8192llama3_70b_8192llama3_1_405b_8192_fsdp_dcnllama3_1_405b_8192_pure_fsdp_icillama3_1_8b_8192llama3_1_8b_8192_bs5llama3_1_8b_8192_no_collective_matmulllama3_1_70b_8192llama3_1_70b_8192_bs2llama3_1_70b_8192_bs2_bfloat16_no_collective_matmulllama3_1_70b_8192_bs4llama3_1_70b_8192_syntheticllama3_1_70b_8192_rd_grainllama3_1_70b_8192_synthetic_ckptllama3_1_70b_8192_rd_ckpt_grainllama3_1_70b_8192_pw_lr_rdllama3_1_70b_8192_iter_real_data_and_checkpointing_tfdsllama3_1_70b_8192_synthllama3_1_70b_129024mistral_7bmixtral_8x7b_droplessmixtral_8x7b_droppedmixtral_8x7b_dropped_int8mixtral_8x22b_droppeddeepseek_v3_ep16gemma2_9b_8192gemma2_27b_8192gemma3_12b_32768_v6e256gemma3_12b_32768_2x_v6e256gemma3_12b_32768_4x_v6e256llama3_1_70b_131072custom_moe_700bllama3_1_405b_8192_v5p_1024deepseek_v3_ep_256_v5p_512llama4_scout_dropless_v5p_256llama4_maverick_dropless_v5p_256llama2_70b_v5p_128llama2_7b_v5p_128gpt_3_175b_v5p_128gpt_3_175b_v5p_128_scdeepseek3_671b_v5p_1024default_16b_v5e_256default_32b_v5e_256default_64b_v5e_256default_128b_v5e_256gpt_3_175b_v5e_256llama2_7b_v5e_256llama2_13b_v5e_256llama2_70b_v5e_256llama3_1_8b_8192_v5e_256deepseek_v3_ep_256_v5p_512_c4mlperfManually ran this to test also:
python3 -m MaxText.decode MaxText/configs/base.yml \ model_name=llama2-7b \ tokenizer_path=src/MaxText/assets/tokenizer_llama3.tiktoken \ tokenizer_type=tiktoken \ scan_layers=false \ per_device_batch_size=1 \ ici_fsdp_parallelism=1 \ ici_autoregressive_parallelism=-1 \ max_prefill_predict_length=128 \ max_target_length=256 \ prompt="I love to" \ attention=dot_productChecklist
Before submitting this PR, please make sure (put X in square brackets):